Load in some packages
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.3
theme_set(theme_bw(base_size = 18))
library(lavaan)
## This is lavaan 0.5-20
## lavaan is BETA software! Please report any bugs.
library(semPlot)
## Warning: running command ''/usr/bin/otool' -L '/Library/Frameworks/
## R.framework/Resources/library/tcltk/libs//tcltk.so'' had status 69
## Warning: replacing previous import by 'ggplot2::unit' when loading 'Hmisc'
## Warning: replacing previous import by 'ggplot2::arrow' when loading 'Hmisc'
## Warning: replacing previous import by 'scales::alpha' when loading 'Hmisc'
library(corrplot)
plot_matrix <- function(matrix_toplot){
corrplot(matrix_toplot, is.corr = FALSE,
type = 'lower',
order = "original",
tl.col='black', tl.cex=.75)
}
Structural equation modeling (SEM) can be a very useful tool in determining relationships between variables. Often SEM is used in a “confirmatory” manner, when determining whether a certain model is valid (i.e., comparing the goodness-of-fit of nested models). We can even extend SEM to study interactions across groups. SEM is sometimes referred to as causal modeling, path analysis (with latent variables), or covariances structure analysis. It subsumes a bunch of other techniques, like multiple regression, confirmatory factor analysis, ANOVA, etc.
You supply the observed relationship between variables (i.e., the covariance or correlation matrix), the # of observations, and a formal model specificiation, and SEM basically estimates parameters that will give you the “best” reproduction of the covariance matrix. The better your model fit, the better your reproduction of the covariance matrix (hence, lower chi-squared = better model)!
For more information on how to conduct classic mediation with lavaan, check out the tutorial here.
Often we are interested in investigating latent, abstract variables (like “intelligence”) by obtaining multiple observable measures (e.g., high school GPA, SAT and ACT scores). Using SEM we can easily include latent variables!
SEM necessitates large sample sizes! In the literature, sample sizes commonly run 200 - 400 for models with 10 - 15 indicators. One survey of 72 SEM studies found the median sample size was 198. A sample of 150 is considered too small unless the covariance coefficients are relatively large. With over ten variables, sample size under 200 generally means parameter estimates are unstable and significance tests lack power.
One rule of thumb found in the literature is that sample size should be at least 50 more than 8 times the number of variables in the model. Mitchell (1993) advances the rule of thumb that there be 10 to 20 times as many cases as variables. Another rule of thumb, based on Stevens (1996), is to have at least 15 cases per measured variable or indicator. The researcher should go beyond these minimum sample size recommendations particularly when data are non-normal (skewed, kurtotic) or incomplete. Note also that to compute the asymptotic covariance matrix, one needs \(\frac{k(k+1)}{2}\) observations, where \(k\) is the number of variables.
Kievit, R. A., Davis, S. W., Mitchell, D. J., Taylor, J. R., Duncan, J., Henson, R. N., & Cam-CAN Research team. (2014). Distinct aspects of frontal lobe structure mediate age-related differences in fluid intelligence and multitasking. Nature communications, 5.
“Following best practice in SEM, we first report our measurement model on the full sample of N=567 (for further details on the sample and model fitting, see Methods). For this model, we hypothesize that two latent variables (fluid intelligence and multitasking; tasks shown in Fig. 1, see Methods for more detail) capture the covariance between the six behavioural variables described in the Methods section, freely estimating every factor loading. This model fits the data well, \(\chi^2\) = 15.40, degrees of freedom (df) = 8, P = 0.052, root mean square error of approximation (RMSEA) = 0.04 (0.00–0.070), comparative fit index (CFI) = 0.993, standardized root mean square residual (SRMR)=0.023, Satorra–Bentler scaling factor=1.009. As the two latent factors are positively correlated (standardized estimate = 0.325, Z = 6.17, P<0.001), we can ask whether a more parsimonious model with only a single cognitive factor (for example, ‘executive function’) shows better fit. Such a model would be compatible with a unitary perspective on the age-related decline of higher cognitive function. However, this one-factor model fits very poorly: \(\chi^2\) =334.149, df=9, P<0.00001, RMSEA=0.252 (0.231–0.275), CFI = 0.685, SRMR = 0.121, Satorra–Bentler scaling factor = 1.109, significantly worse than the two-factor model (\(\chi^2\) = 46.224, dfdiff = 1, P<0.00001).”
“For this reason, partial least squares is often considered a pragmatic alternative to the more principled, theory-driven SEM approach, which is why we here prefer the latter to test our a priori conceptualization”
“SEM were fit using the package Lavaan in R, plots were generated using ggplot2 (ref. 69). We used the following guidelines for judging good fit: RMSEA<0.05 (acceptable: 0.05–0.08), CFI>0.97 (acceptable: 0.95–0.97) and SRMR<0.05 (acceptable: 0.05–0.10) and report the Satorra–Bentler scaling factor for each fitted model. All models were fit using Maximum Likelihood Estimation using robust s.e., and report overall model fit using the Satorra–Bentler scaled test statistic”
Here we have data on environmental, behavioral, genetic, and control variables. The main environmental risk was adolescent-onset use of cannabis (earlyon = 1/0, i.e., yes/no). The variable, schizo (= 1/0), indexes whether or not a person is diagnosed with schizophreniform disorder at age 26, and psysym is a self-report of the severity of psychotic symptoms at that age. The specific gene used to define the genotypes was the COMT gene, which has two alleles, namely, valine (V) and methionine (M), that affect the level of dopamine activity. We can treat “genotype” as a quantitative variable (COMT) with values 0 (for MM individuals), 1 (for VM) and 2 (for VV). Many control variables were measured, such as, chpsycq a quantitative measure of the severity of childhood psychotic symptoms.
Lavaan can take a correlation or covariance matrix (or even a dataframe) as input. Here, we have a covariance matrix. Remember to look at your data!
d = read.table("http://www.stanford.edu/class/psych253/data/comt-covar7.txt", sep = "\t")
lower = '
.183
.007 .500
.009 .001 .035
.556 .207 .018 13.025
.061 .010 -.007 .466 .164
.030 -.275 .096 .899 .216 25.646
.188 .134 .015 .695 .060 -.163 .321'
labels1 = c("EarlyOn", "COMT", "Schizo", "Psysym", "Conduct", "Chpsyq", "GenxEnv")
comt7.cov = getCov(lower, names = labels1); comt7.cov
## EarlyOn COMT Schizo Psysym Conduct Chpsyq GenxEnv
## EarlyOn 0.183 0.007 0.009 0.556 0.061 0.030 0.188
## COMT 0.007 0.500 0.001 0.207 0.010 -0.275 0.134
## Schizo 0.009 0.001 0.035 0.018 -0.007 0.096 0.015
## Psysym 0.556 0.207 0.018 13.025 0.466 0.899 0.695
## Conduct 0.061 0.010 -0.007 0.466 0.164 0.216 0.060
## Chpsyq 0.030 -0.275 0.096 0.899 0.216 25.646 -0.163
## GenxEnv 0.188 0.134 0.015 0.695 0.060 -0.163 0.321
plot_matrix(comt7.cov)
Before running SEM, it’s helpful to plan out your hypothesized “model” of how the variables are related. For instance, run lm() or glm() models to get a sense for relationships! Better yet, have a priori hypotheses before collecting data (based off of the literature, etc.), and you can generate your model before you even have data!
It’s often helpful to sketch out your model beforehand, so grab a piece of paper (or find a whiteboard) for the homework, and use that to draw out your models! Remember that circles are used to describe latent variables, and squares for observed variables. Some predictor variables can be exogenous, meaning they are of “external origin” and their causes are not included in the model (i.e., no arrows are pointing towards it). The remaining variables are endogenous (e.g., a DV), meaning that they are the effects of other variables; in SEM, endogenous variables can also predict other variables!
Here we have a model that we want to recreate:
As a note, remember that the numbers of parameters you can estimate depends on the number of variables! If you have k variables, you can estimate \(\frac{k*(k-1)}{2}\) parameters. For instance, if you have 3 variables, you can estimate \(\frac{3*2}{2} = 3\) parameters. If you have 6 variables, you can estimate \(\frac{6*5}{2} = 15\) parameters. Here, we have 7 variables, so we can estimate up to \(\frac{7*6}{2} = 21\) parameters (though we’ll probably need fewer!).
Similar to lm() and lmer(), etc., the model format has the DV on the left side of the operator, and the IV on the right side. In terms of the structural diagram, you could think of the arrow going from right to left. When the operator is a ~ this is regression, such that the DV(s) are “predicted by” the IV(s). When the operator is =~ we are defining a latent variable, reading it like L is “measured by” IVs. Variances/covariances are specified with a ~~ (e.g., DV is “correlated with” DV), and intercepts are simple regression formulas with 1 as the predictor. We include all these formulas inside single quotes.
Y ~ X1 + X2Y1 + Y2 ~ X1L =~ U + VX ~~ XX ~~ Y1X ~~ 0*Y2X ~ 1For some practice generating the structural plot and equations, go here!
comt7.model1 = '
Schizo ~ EarlyOn + GenxEnv + Psysym
Psysym ~ EarlyOn + Conduct + Schizo
Conduct ~ EarlyOn
Chpsyq ~ COMT
'
In this model, we want to specify the following:
Note that this model doesn’t include any definitions of latent variables, or explicit variances/covariances. However, the variances can still be estimated, and the covariances between exogenous variables can still be estimated, as we’ll see in a second!
Once we have the model specified, the covariance matrix all set, and we know the number of observations (n=803 in this case), we can actually fit the model!
Note that as long as we have fixed.x = F, the model will estimate the mean, variance, and covariance of your IVs as free parameters automatically. These parameters must be explicitly removed if you don’t want them.
comt7.fit1 = sem(comt7.model1, fixed.x = F,
sample.cov = comt7.cov,
sample.nobs = 803)
## Found more than one class "Model" in cache; using the first, from namespace 'lavaan'
# To fit like in Kievit et al. (2014), use ML estimation with robust standard errors (only if you have a full dataset so can't actually run this here!)
# comt7.fit1b = sem(comt7.model1, fixed.x = F,
# sample.cov = comt7.cov,
# sample.nobs = 803,
# estimator = 'MLM')
# # estimator = maximum likelihood estimation with robust standard errors and a Satorra-Bentler scaled test statistic. For complete data only.
# summary(comt7.fit1b, fit.measures = TRUE) # even more fit info
#summary(comt7.fit1)
summary(comt7.fit1, fit.measures = TRUE) # even more fit info
## lavaan (0.5-20) converged normally after 83 iterations
##
## Number of observations 803
##
## Estimator ML
## Minimum Function Test Statistic 33.578
## Degrees of freedom 10
## P-value (Chi-square) 0.000
##
## Model test baseline model:
##
## Minimum Function Test Statistic 336.588
## Degrees of freedom 18
## P-value 0.000
##
## User model versus baseline model:
##
## Comparative Fit Index (CFI) 0.926
## Tucker-Lewis Index (TLI) 0.867
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -6179.642
## Loglikelihood unrestricted model (H1) -6162.854
##
## Number of free parameters 18
## Akaike (AIC) 12395.285
## Bayesian (BIC) 12479.675
## Sample-size adjusted Bayesian (BIC) 12422.515
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.054
## 90 Percent Confidence Interval 0.035 0.075
## P-value RMSEA <= 0.05 0.333
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.034
##
## Parameter Estimates:
##
## Information Expected
## Standard Errors Standard
##
## Regressions:
## Estimate Std.Err Z-value P(>|z|)
## Schizo ~
## EarlyOn 0.086 0.038 2.279 0.023
## GenxEnv 0.084 0.024 3.577 0.000
## Psysym -0.041 0.010 -4.003 0.000
## Psysym ~
## EarlyOn 1.438 0.433 3.322 0.001
## Conduct 2.866 0.452 6.335 0.000
## Schizo 13.109 3.301 3.971 0.000
## Conduct ~
## EarlyOn 0.333 0.031 10.661 0.000
## Chpsyq ~
## COMT -0.550 0.252 -2.183 0.029
##
## Covariances:
## Estimate Std.Err Z-value P(>|z|)
## EarlyOn ~~
## GenxEnv 0.188 0.011 17.368 0.000
## COMT 0.007 0.011 0.656 0.512
## GenxEnv ~~
## COMT 0.134 0.015 8.989 0.000
##
## Variances:
## Estimate Std.Err Z-value P(>|z|)
## Schizo 0.052 0.009 5.529 0.000
## Psysym 16.318 2.973 5.488 0.000
## Conduct 0.143 0.007 20.037 0.000
## Chpsyq 25.463 1.271 20.037 0.000
## EarlyOn 0.183 0.009 20.037 0.000
## GenxEnv 0.321 0.016 20.037 0.000
## COMT 0.499 0.025 20.037 0.000
# Just the parameter estimates
parameterEstimates(comt7.fit1)
## lhs op rhs est se z pvalue ci.lower ci.upper
## 1 Schizo ~ EarlyOn 0.086 0.038 2.279 0.023 0.012 0.160
## 2 Schizo ~ GenxEnv 0.084 0.024 3.577 0.000 0.038 0.131
## 3 Schizo ~ Psysym -0.041 0.010 -4.003 0.000 -0.061 -0.021
## 4 Psysym ~ EarlyOn 1.438 0.433 3.322 0.001 0.590 2.287
## 5 Psysym ~ Conduct 2.866 0.452 6.335 0.000 1.979 3.753
## 6 Psysym ~ Schizo 13.109 3.301 3.971 0.000 6.639 19.579
## 7 Conduct ~ EarlyOn 0.333 0.031 10.661 0.000 0.272 0.395
## 8 Chpsyq ~ COMT -0.550 0.252 -2.183 0.029 -1.044 -0.056
## 9 Schizo ~~ Schizo 0.052 0.009 5.529 0.000 0.033 0.070
## 10 Psysym ~~ Psysym 16.318 2.973 5.488 0.000 10.491 22.146
## 11 Conduct ~~ Conduct 0.143 0.007 20.037 0.000 0.129 0.158
## 12 Chpsyq ~~ Chpsyq 25.463 1.271 20.037 0.000 22.972 27.954
## 13 EarlyOn ~~ EarlyOn 0.183 0.009 20.037 0.000 0.165 0.201
## 14 EarlyOn ~~ GenxEnv 0.188 0.011 17.368 0.000 0.167 0.209
## 15 EarlyOn ~~ COMT 0.007 0.011 0.656 0.512 -0.014 0.028
## 16 GenxEnv ~~ GenxEnv 0.321 0.016 20.037 0.000 0.289 0.352
## 17 GenxEnv ~~ COMT 0.134 0.015 8.989 0.000 0.105 0.163
## 18 COMT ~~ COMT 0.499 0.025 20.037 0.000 0.451 0.548
# The fitted cov matrix
fitted(comt7.fit1)
## $cov
## Schizo Psysym Condct Chpsyq ErlyOn GnxEnv COMT
## Schizo 0.035
## Psysym 0.017 12.963
## Conduct -0.008 0.453 0.164
## Chpsyq -0.004 -0.062 -0.001 25.614
## EarlyOn 0.009 0.555 0.061 -0.004 0.183
## GenxEnv 0.016 0.663 0.063 -0.074 0.188 0.321
## COMT 0.007 0.113 0.002 -0.275 0.007 0.134 0.499
##
## $mean
## Schizo Psysym Conduct Chpsyq EarlyOn GenxEnv COMT
## 0 0 0 0 0 0 0
# Plot!
semPaths(comt7.fit1, what='std',
node.label.cex=5,
edge.label.cex=1.25, curvePivot = TRUE,
fade=FALSE)
Looking at the model output, we can see that this model is not doing very well, since the estimated covariance matrix is significantly different from the actual covariance matrix, \(\chi^2\)(10) = 33.578, p < 0.001.
We can see, however, that we’re also doing a pretty good job predicting things. As predicted from our original causal diagram, adolescent-onset use of cannabis (EarlyOn) significantly predicts whether or not a person is diagnozed w/schizophreniform disorder at age 26 (Schizo), z=2.3, p < 0.05. Schizo is also significantly predicted by GenxEnv and Psysym (\(p\)s < 0.05). So that’s good! But, let’s take a better look at where we’re failing…
plot_matrix(residuals(comt7.fit1)$cov)
residuals(comt7.fit1)
## $type
## [1] "raw"
##
## $cov
## Schizo Psysym Condct Chpsyq ErlyOn GnxEnv COMT
## Schizo 0.000
## Psysym 0.001 0.045
## Conduct 0.001 0.012 0.000
## Chpsyq 0.100 0.960 0.217 0.000
## EarlyOn 0.000 0.000 0.000 0.034 0.000
## GenxEnv -0.001 0.032 -0.003 -0.089 0.000 0.000
## COMT -0.006 0.094 0.008 0.000 0.000 0.000 0.000
##
## $mean
## Schizo Psysym Conduct Chpsyq EarlyOn GenxEnv COMT
## 0 0 0 0 0 0 0
We might want to add in some relationship between Psysym and Chpsyq, or between Schizo and Chpsyq, or between Conduct and Chpsyq.
mi = modificationIndices(comt7.fit1)
head(mi[order(-mi$mi), ], 10)
## lhs op rhs mi epc sepc.lv sepc.all sepc.nox
## 26 Schizo ~ Chpsyq 12.545 0.006 0.006 0.152 0.152
## 21 Schizo ~~ Chpsyq 12.361 0.142 0.142 0.150 0.150
## 24 Conduct ~~ Chpsyq 9.688 0.210 0.210 0.103 0.103
## 38 Chpsyq ~ Conduct 9.423 1.351 1.351 0.108 0.108
## 33 Conduct ~ Chpsyq 9.249 0.008 0.008 0.100 0.100
## 36 Chpsyq ~ Schizo 8.381 2.756 2.756 0.102 0.102
## 50 GenxEnv ~ Chpsyq 5.041 -0.005 -0.005 -0.043 -0.043
## 56 COMT ~ Chpsyq 4.953 0.019 0.019 0.136 0.136
## 44 EarlyOn ~ Chpsyq 4.357 0.004 0.004 0.043 0.043
## 30 Psysym ~ COMT 2.498 0.328 0.328 0.064 0.091
Here the “mi” column gives a rough approximation of how the model would improve if new parameters were added. EPC (expected parameter change) is the value this parameter would have if you added it.
Let’s add in Chpsyq predicting Schizo, and a covariance between Conduct and Chpsyq.
comt7.model2 = '
Schizo ~ EarlyOn + GenxEnv + Psysym + Chpsyq
Psysym ~ EarlyOn + Conduct + Schizo
Conduct ~ EarlyOn
Chpsyq ~ COMT
Conduct ~~ Chpsyq
'
comt7.fit2 = sem(comt7.model2, sample.cov = comt7.cov, sample.nobs = 803)
summary(comt7.fit2)
## lavaan (0.5-20) converged normally after 79 iterations
##
## Number of observations 803
##
## Estimator ML
## Minimum Function Test Statistic 11.007
## Degrees of freedom 8
## P-value (Chi-square) 0.201
##
## Parameter Estimates:
##
## Information Expected
## Standard Errors Standard
##
## Regressions:
## Estimate Std.Err Z-value P(>|z|)
## Schizo ~
## EarlyOn 0.073 0.035 2.064 0.039
## GenxEnv 0.091 0.023 3.961 0.000
## Psysym -0.039 0.009 -4.334 0.000
## Chpsyq 0.006 0.002 3.399 0.001
## Psysym ~
## EarlyOn 1.479 0.408 3.621 0.000
## Conduct 2.827 0.422 6.704 0.000
## Schizo 12.545 2.936 4.273 0.000
## Conduct ~
## EarlyOn 0.332 0.031 10.676 0.000
## Chpsyq ~
## COMT -0.572 0.250 -2.286 0.022
##
## Covariances:
## Estimate Std.Err Z-value P(>|z|)
## Conduct ~~
## Chpsyq 0.210 0.068 3.097 0.002
##
## Variances:
## Estimate Std.Err Z-value P(>|z|)
## Schizo 0.049 0.008 6.276 0.000
## Psysym 15.841 2.556 6.197 0.000
## Conduct 0.143 0.007 20.037 0.000
## Chpsyq 25.463 1.271 20.037 0.000
anova(comt7.fit2, comt7.fit1)
## Chi Square Difference Test
##
## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
## comt7.fit2 8 12365 12430 11.007
## comt7.fit1 10 12395 12480 33.578 22.571 2 1.256e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This new model is doing significantly better than our first model! We have a lower AIC value, and our chi-squared is significantly lower! That means we’re doing a better job reproducing the covariance matrix!
plot_matrix(residuals(comt7.fit2)$cov)
mi = modificationIndices(comt7.fit2)
head(mi[order(-mi$mi), ], 10)
## lhs op rhs mi epc sepc.lv sepc.all sepc.nox
## 56 COMT ~ Chpsyq 5.240 0.019 0.019 0.134 0.134
## 50 GenxEnv ~ Chpsyq 5.008 -0.005 -0.005 -0.043 -0.043
## 44 EarlyOn ~ Chpsyq 4.329 0.004 0.004 0.043 0.043
## 30 Psysym ~ COMT 2.560 0.325 0.325 0.064 0.090
## 28 Psysym ~ Chpsyq 2.147 -0.044 -0.044 -0.062 -0.062
## 24 Psysym ~~ Conduct 1.806 0.703 0.703 0.482 0.482
## 25 Psysym ~~ Chpsyq 1.806 -1.030 -1.030 -0.056 -0.056
## 29 Psysym ~ GenxEnv 1.722 0.575 0.575 0.090 0.159
## 37 Chpsyq ~ Psysym 1.461 -0.078 -0.078 -0.056 -0.056
## 55 COMT ~ Conduct 1.394 0.063 0.063 0.036 0.036
Now let’s add in a parameter to see whether severity of childhood psychotic symptoms (Chpsyq) predicts severity of psychotic symptoms (Psysym), since we might be missing something there!
comt7.model3 = '
Schizo ~ EarlyOn + GenxEnv + Psysym + Chpsyq
Psysym ~ EarlyOn + Conduct + Schizo + Chpsyq
Conduct ~ EarlyOn
Chpsyq ~ COMT
Conduct ~~ Chpsyq
'
comt7.fit3 = sem(comt7.model3, sample.cov = comt7.cov, sample.nobs = 803)
summary(comt7.fit3)
## lavaan (0.5-20) converged normally after 80 iterations
##
## Number of observations 803
##
## Estimator ML
## Minimum Function Test Statistic 8.908
## Degrees of freedom 7
## P-value (Chi-square) 0.259
##
## Parameter Estimates:
##
## Information Expected
## Standard Errors Standard
##
## Regressions:
## Estimate Std.Err Z-value P(>|z|)
## Schizo ~
## EarlyOn 0.082 0.038 2.192 0.028
## GenxEnv 0.096 0.024 3.926 0.000
## Psysym -0.044 0.010 -4.247 0.000
## Chpsyq 0.006 0.002 3.486 0.000
## Psysym ~
## EarlyOn 1.345 0.447 3.009 0.003
## Conduct 3.006 0.477 6.309 0.000
## Schizo 14.198 3.401 4.174 0.000
## Chpsyq -0.045 0.033 -1.359 0.174
## Conduct ~
## EarlyOn 0.332 0.031 10.676 0.000
## Chpsyq ~
## COMT -0.572 0.250 -2.286 0.022
##
## Covariances:
## Estimate Std.Err Z-value P(>|z|)
## Conduct ~~
## Chpsyq 0.210 0.068 3.097 0.002
##
## Variances:
## Estimate Std.Err Z-value P(>|z|)
## Schizo 0.053 0.010 5.346 0.000
## Psysym 17.251 3.279 5.261 0.000
## Conduct 0.143 0.007 20.037 0.000
## Chpsyq 25.463 1.271 20.037 0.000
anova(comt7.fit2, comt7.fit3)
## Chi Square Difference Test
##
## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
## comt7.fit3 7 12365 12435 8.9079
## comt7.fit2 8 12365 12430 11.0071 2.0992 1 0.1474
# plot_matrix(residuals(comt7.fit3)$cov)
Here we didn’t significantly decrease the \(\chi^2\), and we added more predictors, so let’s stick with the more parsimonious model.
What if we explicitly remove the covariance between GenxEnv and COMT?
comt7.model4 = '
Schizo ~ EarlyOn + GenxEnv + Psysym + Chpsyq
Psysym ~ EarlyOn + Conduct + Schizo + Chpsyq
Conduct ~ EarlyOn
Chpsyq ~ COMT
Conduct ~~ Chpsyq
GenxEnv ~~ 0*COMT
'
comt7.fit4 = sem(comt7.model3, sample.cov = comt7.cov, sample.nobs = 803)
summary(comt7.fit4)
## lavaan (0.5-20) converged normally after 80 iterations
##
## Number of observations 803
##
## Estimator ML
## Minimum Function Test Statistic 8.908
## Degrees of freedom 7
## P-value (Chi-square) 0.259
##
## Parameter Estimates:
##
## Information Expected
## Standard Errors Standard
##
## Regressions:
## Estimate Std.Err Z-value P(>|z|)
## Schizo ~
## EarlyOn 0.082 0.038 2.192 0.028
## GenxEnv 0.096 0.024 3.926 0.000
## Psysym -0.044 0.010 -4.247 0.000
## Chpsyq 0.006 0.002 3.486 0.000
## Psysym ~
## EarlyOn 1.345 0.447 3.009 0.003
## Conduct 3.006 0.477 6.309 0.000
## Schizo 14.198 3.401 4.174 0.000
## Chpsyq -0.045 0.033 -1.359 0.174
## Conduct ~
## EarlyOn 0.332 0.031 10.676 0.000
## Chpsyq ~
## COMT -0.572 0.250 -2.286 0.022
##
## Covariances:
## Estimate Std.Err Z-value P(>|z|)
## Conduct ~~
## Chpsyq 0.210 0.068 3.097 0.002
##
## Variances:
## Estimate Std.Err Z-value P(>|z|)
## Schizo 0.053 0.010 5.346 0.000
## Psysym 17.251 3.279 5.261 0.000
## Conduct 0.143 0.007 20.037 0.000
## Chpsyq 25.463 1.271 20.037 0.000
anova(comt7.fit2, comt7.fit4)
## Chi Square Difference Test
##
## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
## comt7.fit4 7 12365 12435 8.9079
## comt7.fit2 8 12365 12430 11.0071 2.0992 1 0.1474
# plot_matrix(residuals(comt7.fit3)$cov)
Here, note that since I’ve explicitly specified covariance in this model (or lack thereof), it overrides the default behavior, so I have to manually define the other covariances if I want to keep them!
Try other variations to get the best model!
Here, Sadler & Woody (2003) test an interpersonal theory of dyadic interaction: Person M’s ‘behavior’ when interacting with person F (i.e., M’s situational behavior) is influenced by (i) M’s long-term tendency to elicit the behavior (i.e., M’s behavioral trait), and (ii) F’s situational behavior. The same is assumed true for F.
The four variables, trait and behavior for each of M and F, are represented as latent variables that can be measured in multiple ways. “Reciprocal influence” is conceptualized as M’s situational behavior influencing F’s situational behavior, and vice versa. In this study, M and F were strangers, so M’s trait (which is unknown to F) couldn’t affect F’s situational behavior. However, if M and F were friends, such links would be expected to be significant and should be included in the model. Trait is measured by self-report and a friend’s report; and situational behavior is rated by self, partner and an observer. Rater biases are introduced through the use of covariance arrows.
low.dom = '
1
.415 1
.460 .351 1
-.321 -.374 -.310 1
-.239 -.221 -.133 .626 1
-.185 -.164 -.272 .533 .345 1
.349 .307 .496 -.180 -.081 -.067 1
.308 .302 .562 -.222 -.156 -.118 .573 1
.038 -.115 -.048 .296 .167 .320 .008 -.036 1
-.072 -.160 -.124 .317 .167 .248 -.152 -.175 .414 1
'
labels1 = c("FSOR", "FSIR", "FSSR", "MSOR", "MSIR", "MSSR", "FTFR", "FTSR", "MTSR", "MTFR")
dom.cov = getCov(low.dom, names = labels1)
plot_matrix(dom.cov)
Here, we have latent variables that are measured by observed variables. Specifically, female trait dominance (FTZ) is measured by friends and self trait ratings (FTSR and FTFT). Similarly, female situational dominance (FSZ) is measured by self, partner, and observers’ ratings. The same is true for males. Thus, we’ll have to define these latent variables in the model, and then see how these latent variables are predicted by other latent variables!
We want to estimate the paths A and B (the effect of the individuals’ traits on their situational behavior) and C and D (the effects of individuals’ situational behavior on each other). Let’s first allow these paths to be unconstrained (i.e., we can get a different estimate for each path).
Using the * operator below “names” the regression coefficient. This is useful mainly because if we give to coefficients the same name, the function automatically constrains them to be the same.
dyad.model1 = '
# Latent variable definitions for trait, FTZ & MTZ, and situation, FSZ & MSZ
FTZ =~ FTSR + FTFR
MTZ =~ MTSR + MTFR
FSZ =~ FSSR + FSIR + FSOR
MSZ =~ MSSR + MSIR + MSOR
# regressions
FSZ ~ B*FTZ + C*MSZ
MSZ ~ A*MTZ + D*FSZ
# variances and covariances
# Residual correls among M ratings, MTSR, MSSR, FSIR
MTSR + MSSR ~~ FSIR
MSSR ~~ MTSR
# Residual correls among F ratings, FTSR, FSSR, MSIR
FTSR + FSSR ~~ MSIR
FSSR ~~ FTSR
# Residual correls among Observer situational ratings, MSOR, FSOR
MSOR ~~ FSOR
'
dyad.fit1 = sem(dyad.model1, fixed.x = F, sample.cov = dom.cov, sample.nobs = 112)
summary(dyad.fit1) #good p value
## lavaan (0.5-20) converged normally after 41 iterations
##
## Number of observations 112
##
## Estimator ML
## Minimum Function Test Statistic 17.306
## Degrees of freedom 23
## P-value (Chi-square) 0.794
##
## Parameter Estimates:
##
## Information Expected
## Standard Errors Standard
##
## Latent Variables:
## Estimate Std.Err Z-value P(>|z|)
## FTZ =~
## FTSR 1.000
## FTFR 1.120 0.231 4.858 0.000
## MTZ =~
## MTSR 1.000
## MTFR 1.129 0.382 2.952 0.003
## FSZ =~
## FSSR 1.000
## FSIR 0.833 0.167 4.984 0.000
## FSOR 0.928 0.175 5.297 0.000
## MSZ =~
## MSSR 1.000
## MSIR 1.127 0.207 5.435 0.000
## MSOR 1.678 0.302 5.563 0.000
##
## Regressions:
## Estimate Std.Err Z-value P(>|z|)
## FSZ ~
## FTZ (B) 0.696 0.136 5.116 0.000
## MSZ (C) -0.283 0.160 -1.769 0.077
## MSZ ~
## MTZ (A) 0.418 0.152 2.741 0.006
## FSZ (D) -0.231 0.115 -2.008 0.045
##
## Covariances:
## Estimate Std.Err Z-value P(>|z|)
## MTSR ~~
## FSIR -0.044 0.073 -0.599 0.549
## FSIR ~~
## MSSR 0.058 0.069 0.838 0.402
## MTSR ~~
## MSSR 0.130 0.073 1.785 0.074
## FTSR ~~
## MSIR -0.028 0.059 -0.474 0.635
## FSSR ~~
## MSIR 0.074 0.061 1.213 0.225
## FTSR ~~
## FSSR 0.147 0.074 1.985 0.047
## FSOR ~~
## MSOR 0.008 0.057 0.144 0.886
## FTZ ~~
## MTZ -0.073 0.062 -1.187 0.235
##
## Variances:
## Estimate Std.Err Z-value P(>|z|)
## FTSR 0.485 0.111 4.380 0.000
## FTFR 0.375 0.124 3.025 0.002
## MTSR 0.629 0.142 4.422 0.000
## MTFR 0.545 0.163 3.342 0.001
## FSSR 0.498 0.098 5.070 0.000
## FSIR 0.661 0.105 6.291 0.000
## FSOR 0.571 0.101 5.665 0.000
## MSSR 0.677 0.099 6.842 0.000
## MSIR 0.571 0.091 6.280 0.000
## MSOR 0.088 0.106 0.837 0.403
## FTZ 0.491 0.143 3.419 0.001
## MTZ 0.350 0.152 2.310 0.021
## FSZ 0.158 0.073 2.167 0.030
## MSZ 0.189 0.065 2.920 0.004
semPaths(dyad.fit1, what='std',
node.label.cex=5,
edge.label.cex=1.25, curvePivot = TRUE,
fade=FALSE)
This model is ok, but let’s see if we can get a simpler model that works! Let’s try taking out most of the covariance terms.
dyad.model2 = '
# Latent variable definitions for trait, FTZ & MTZ, and situation, FSZ & MSZ
FTZ =~ FTSR + FTFR
MTZ =~ MTSR + MTFR
FSZ =~ FSSR + FSIR + FSOR
MSZ =~ MSSR + MSIR + MSOR
# regressions
FSZ ~ B*FTZ + C*MSZ
MSZ ~ A*MTZ + D*FSZ
# Residual correls among M ratings, MTSR, MSSR
MSSR ~~ MTSR
# Residual correls among F ratings, FTSR, FSSR
FSSR ~~ FTSR
'
dyad.fit2 = sem(dyad.model2, fixed.x = F, sample.cov = dom.cov, sample.nobs = 112)
summary(dyad.fit2) # Wow, even better !
## lavaan (0.5-20) converged normally after 38 iterations
##
## Number of observations 112
##
## Estimator ML
## Minimum Function Test Statistic 20.970
## Degrees of freedom 28
## P-value (Chi-square) 0.827
##
## Parameter Estimates:
##
## Information Expected
## Standard Errors Standard
##
## Latent Variables:
## Estimate Std.Err Z-value P(>|z|)
## FTZ =~
## FTSR 1.000
## FTFR 1.122 0.231 4.862 0.000
## MTZ =~
## MTSR 1.000
## MTFR 1.110 0.370 2.996 0.003
## FSZ =~
## FSSR 1.000
## FSIR 0.818 0.167 4.899 0.000
## FSOR 0.916 0.172 5.321 0.000
## MSZ =~
## MSSR 1.000
## MSIR 1.170 0.220 5.308 0.000
## MSOR 1.804 0.338 5.334 0.000
##
## Regressions:
## Estimate Std.Err Z-value P(>|z|)
## FSZ ~
## FTZ (B) 0.713 0.137 5.194 0.000
## MSZ (C) -0.281 0.158 -1.778 0.075
## MSZ ~
## MTZ (A) 0.392 0.145 2.711 0.007
## FSZ (D) -0.209 0.107 -1.949 0.051
##
## Covariances:
## Estimate Std.Err Z-value P(>|z|)
## MTSR ~~
## MSSR 0.133 0.073 1.823 0.068
## FTSR ~~
## FSSR 0.145 0.075 1.937 0.053
## FTZ ~~
## MTZ -0.074 0.062 -1.196 0.232
##
## Variances:
## Estimate Std.Err Z-value P(>|z|)
## FTSR 0.486 0.110 4.401 0.000
## FTFR 0.375 0.124 3.029 0.002
## MTSR 0.624 0.142 4.380 0.000
## MTFR 0.550 0.160 3.445 0.001
## FSSR 0.498 0.098 5.073 0.000
## FSIR 0.663 0.105 6.316 0.000
## FSOR 0.580 0.100 5.782 0.000
## MSSR 0.697 0.100 6.978 0.000
## MSIR 0.590 0.092 6.384 0.000
## MSOR 0.036 0.113 0.319 0.750
## FTZ 0.489 0.143 3.415 0.001
## MTZ 0.358 0.153 2.339 0.019
## FSZ 0.156 0.072 2.168 0.030
## MSZ 0.179 0.061 2.918 0.004
anova(dyad.fit1,dyad.fit2)
## Chi Square Difference Test
##
## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
## dyad.fit1 23 2925.1 3012.1 17.306
## dyad.fit2 28 2918.8 2992.2 20.970 3.6634 5 0.5988
This simpler model is definitely the better one! We have more dfs (fewer parameters estimated), a lower AIC, and then \(\chi^2\) isn’t significantly worse!
Let’s see if we need our “C” and “D” paths to be different. We have 1 less parameter to estimate if we constrain them to be the same, so it would be more parsimonious if we can have a constrained model!
dyad.model3 = '
# Latent variable definitions for trait, FTZ & MTZ, and situation, FSZ & MSZ
FTZ =~ FTSR + FTFR
MTZ =~ MTSR + MTFR
FSZ =~ FSSR + FSIR + FSOR
MSZ =~ MSSR + MSIR + MSOR
# regressions
FSZ ~ B*FTZ + C*MSZ
MSZ ~ A*MTZ + C*FSZ
# Residual correls among M ratings, MTSR, MSSR
MSSR ~~ MTSR
# Residual correls among F ratings, FTSR, FSSR
FSSR ~~ FTSR
'
dyad.fit3 = sem(dyad.model3, fixed.x = T, sample.cov = dom.cov, sample.nobs = 112) #notice how i set fixed.x = T here so it wouldn't automatically add terms I didn't want
summary(dyad.fit3)
## lavaan (0.5-20) converged normally after 35 iterations
##
## Number of observations 112
##
## Estimator ML
## Minimum Function Test Statistic 21.055
## Degrees of freedom 29
## P-value (Chi-square) 0.857
##
## Parameter Estimates:
##
## Information Expected
## Standard Errors Standard
##
## Latent Variables:
## Estimate Std.Err Z-value P(>|z|)
## FTZ =~
## FTSR 1.000
## FTFR 1.118 0.229 4.881 0.000
## MTZ =~
## MTSR 1.000
## MTFR 1.091 0.365 2.991 0.003
## FSZ =~
## FSSR 1.000
## FSIR 0.823 0.167 4.918 0.000
## FSOR 0.925 0.172 5.366 0.000
## MSZ =~
## MSSR 1.000
## MSIR 1.155 0.211 5.484 0.000
## MSOR 1.775 0.316 5.625 0.000
##
## Regressions:
## Estimate Std.Err Z-value P(>|z|)
## FSZ ~
## FTZ (B) 0.714 0.138 5.186 0.000
## MSZ (C) -0.237 0.061 -3.874 0.000
## MSZ ~
## MTZ (A) 0.390 0.144 2.717 0.007
## FSZ (C) -0.237 0.061 -3.874 0.000
##
## Covariances:
## Estimate Std.Err Z-value P(>|z|)
## MTSR ~~
## MSSR 0.132 0.073 1.819 0.069
## FTSR ~~
## FSSR 0.148 0.074 1.985 0.047
## FTZ ~~
## MTZ -0.073 0.062 -1.167 0.243
##
## Variances:
## Estimate Std.Err Z-value P(>|z|)
## FTSR 0.485 0.110 4.412 0.000
## FTFR 0.380 0.123 3.088 0.002
## MTSR 0.617 0.144 4.283 0.000
## MTFR 0.556 0.159 3.502 0.000
## FSSR 0.499 0.098 5.081 0.000
## FSIR 0.663 0.105 6.305 0.000
## FSOR 0.576 0.100 5.742 0.000
## MSSR 0.695 0.100 6.959 0.000
## MSIR 0.588 0.092 6.390 0.000
## MSOR 0.040 0.111 0.358 0.720
## FTZ 0.489 0.143 3.429 0.001
## MTZ 0.366 0.156 2.352 0.019
## FSZ 0.161 0.072 2.237 0.025
## MSZ 0.182 0.061 2.996 0.003
anova(dyad.fit1,dyad.fit2,dyad.fit3) #looks like constraining it made the model better! So they probably aren't actually different
## Chi Square Difference Test
##
## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
## dyad.fit1 23 2925.1 3012.1 17.306
## dyad.fit2 28 2918.8 2992.2 20.970 3.6634 5 0.5988
## dyad.fit3 29 2916.8 2987.5 21.055 0.0850 1 0.7707
#could also test it this way:
dyad.fit3a = sem(dyad.model2, fixed.x = F, sample.cov = dom.cov, sample.nobs = 112, constraints = 'C - D == 0', start = dyad.fit2) #using our model2 design, but constraining C-D to be 0 (i.e., C = D)
# summary(dyad.fit3a) #same result as before
For this question, we assess Prof Jeanne Tsai’s “Affect Valuation Theory” (AVT), about the antecedents and consequences of ideal affect, and how it might vary across cultures. Specifically, this theory hypothesizes that culture influences ideal affect more than actual affect, whereas temperament/personality influences actual affect more than ideal affect. Further, engaging in rigorous activities is more strongly predicted by ideal affect, but depression is more predicted by actual affect. In addition, culture (Asian American, Hong Kong Chinese, and European American / AA, CH, EA) might influences these processes.
To start out, let’s specify the model for the whole sample, ignoring ‘group’.
d = read.csv("http://www.stanford.edu/class/psych253/data/jt-data1.csv")
avt.model1 = '
# regressions
ideahap + actuhap ~ cultatt + temperatt
rigoract + depress ~ ideahap + actuhap + cultatt + temperatt
# variances and covariances
rigoract ~~ 0*depress
'
avt.fit1 = sem(avt.model1, fixed.x = F, data = d)
summary(avt.fit1)
## lavaan (0.5-20) converged normally after 62 iterations
##
## Used Total
## Number of observations 215 239
##
## Estimator ML
## Minimum Function Test Statistic 9.547
## Degrees of freedom 2
## P-value (Chi-square) 0.008
##
## Parameter Estimates:
##
## Information Expected
## Standard Errors Standard
##
## Regressions:
## Estimate Std.Err Z-value P(>|z|)
## ideahap ~
## cultatt 0.075 0.022 3.455 0.001
## temperatt 0.131 0.031 4.260 0.000
## actuhap ~
## cultatt 0.111 0.035 3.165 0.002
## temperatt 0.354 0.049 7.188 0.000
## rigoract ~
## ideahap 0.508 0.262 1.939 0.053
## actuhap -0.071 0.163 -0.436 0.663
## cultatt -0.105 0.088 -1.193 0.233
## temperatt 0.314 0.136 2.318 0.020
## depress ~
## ideahap 1.687 1.724 0.979 0.328
## actuhap -1.646 1.073 -1.534 0.125
## cultatt 0.401 0.577 0.696 0.487
## temperatt -6.187 0.891 -6.942 0.000
##
## Covariances:
## Estimate Std.Err Z-value P(>|z|)
## rigoract ~~
## depress 0.000
## cultatt ~~
## temperatt 0.280 0.057 4.879 0.000
##
## Variances:
## Estimate Std.Err Z-value P(>|z|)
## ideahap 0.100 0.010 10.368 0.000
## actuhap 0.257 0.025 10.368 0.000
## rigoract 1.473 0.142 10.368 0.000
## depress 63.595 6.134 10.368 0.000
## cultatt 1.118 0.108 10.368 0.000
## temperatt 0.563 0.054 10.368 0.000
Model 1 is almost saturated - only 2 df is left for testing - yet it is poor. This suggests that many of the paths are unhelpful, i.e., have coeffs near 0. These unhelpful paths should be removed and replaced by useful paths. I was surprised that sem() estimated cov(rigoract, depress), because these vars are not exogenous. It appears that sem(), by default, puts a cov path between any pair of variables that are not already linked in the model! This is a waste of params, so explicitly set these unwanted cov paths equal to 0.
A common finding in this field is that ideal HAP and actual HAP are correlated. So introduce a cov link between them.
#This model removes the paths with low p-values, and adds the new covariance
avt.model2 = '
# regressions
# X + Y ~ U + V + W
ideahap + actuhap ~ cultatt + temperatt
rigoract ~ ideahap + temperatt
depress ~ actuhap + temperatt
# variances and covariances
# X ~~ Y
rigoract ~~ 0*depress
ideahap ~~ actuhap
'
avt.fit2 = sem(avt.model2, fixed.x = F, data = d)
summary(avt.fit2) #much better!
## lavaan (0.5-20) converged normally after 45 iterations
##
## Used Total
## Number of observations 215 239
##
## Estimator ML
## Minimum Function Test Statistic 3.556
## Degrees of freedom 5
## P-value (Chi-square) 0.615
##
## Parameter Estimates:
##
## Information Expected
## Standard Errors Standard
##
## Regressions:
## Estimate Std.Err Z-value P(>|z|)
## ideahap ~
## cultatt 0.075 0.022 3.455 0.001
## temperatt 0.131 0.031 4.260 0.000
## actuhap ~
## cultatt 0.111 0.035 3.165 0.002
## temperatt 0.354 0.049 7.188 0.000
## rigoract ~
## ideahap 0.407 0.256 1.588 0.112
## temperatt 0.250 0.119 2.107 0.035
## depress ~
## actuhap -1.224 1.053 -1.162 0.245
## temperatt -5.877 0.845 -6.953 0.000
##
## Covariances:
## Estimate Std.Err Z-value P(>|z|)
## rigoract ~~
## depress 0.000
## ideahap ~~
## actuhap 0.033 0.011 2.991 0.003
## cultatt ~~
## temperatt 0.280 0.057 4.879 0.000
##
## Variances:
## Estimate Std.Err Z-value P(>|z|)
## ideahap 0.100 0.010 10.368 0.000
## actuhap 0.257 0.025 10.368 0.000
## rigoract 1.485 0.143 10.368 0.000
## depress 64.104 6.183 10.368 0.000
## cultatt 1.118 0.108 10.368 0.000
## temperatt 0.563 0.054 10.368 0.000
anova(avt.fit1, avt.fit2)
## Chi Square Difference Test
##
## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
## avt.fit1 2 3758.8 3822.8 9.5469
## avt.fit2 5 3746.8 3800.7 3.5562 -5.9907 3 1
Now, we might wonder if the model we specified above applies to any one of the three groups (EA, AA, and CH).
grplab1 = c("EA", "AA", "CH")
avt.group1 = sem(avt.model2, fixed.x = F, data = d, group = "group", meanstructure = F)
summary(avt.group1)
## lavaan (0.5-20) converged normally after 130 iterations
##
## Used Total
## Number of observations per group
## 1 65 75
## 2 67 80
## 3 83 84
##
## Estimator ML
## Minimum Function Test Statistic 15.149
## Degrees of freedom 15
## P-value (Chi-square) 0.441
##
## Chi-square for each group:
##
## 1 9.798
## 2 2.776
## 3 2.574
##
## Parameter Estimates:
##
## Information Expected
## Standard Errors Standard
##
##
## Group 1 [1]:
##
## Regressions:
## Estimate Std.Err Z-value P(>|z|)
## ideahap ~
## cultatt 0.062 0.037 1.661 0.097
## temperatt 0.172 0.066 2.626 0.009
## actuhap ~
## cultatt 0.005 0.054 0.095 0.925
## temperatt 0.574 0.094 6.079 0.000
## rigoract ~
## ideahap 0.687 0.627 1.096 0.273
## temperatt 0.309 0.352 0.878 0.380
## depress ~
## actuhap -0.280 2.128 -0.132 0.895
## temperatt -5.327 1.998 -2.666 0.008
##
## Covariances:
## Estimate Std.Err Z-value P(>|z|)
## rigoract ~~
## depress 0.000
## ideahap ~~
## actuhap 0.014 0.016 0.861 0.390
## cultatt ~~
## temperatt 0.138 0.076 1.819 0.069
##
## Variances:
## Estimate Std.Err Z-value P(>|z|)
## ideahap 0.090 0.016 5.701 0.000
## actuhap 0.187 0.033 5.701 0.000
## rigoract 2.394 0.420 5.701 0.000
## depress 55.005 9.649 5.701 0.000
## cultatt 1.047 0.184 5.701 0.000
## temperatt 0.340 0.060 5.701 0.000
##
##
## Group 2 [2]:
##
## Regressions:
## Estimate Std.Err Z-value P(>|z|)
## ideahap ~
## cultatt 0.007 0.028 0.245 0.806
## temperatt 0.130 0.041 3.205 0.001
## actuhap ~
## cultatt 0.121 0.059 2.041 0.041
## temperatt 0.454 0.085 5.327 0.000
## rigoract ~
## ideahap 0.593 0.559 1.061 0.289
## temperatt 0.014 0.189 0.072 0.943
## depress ~
## actuhap -4.514 2.094 -2.155 0.031
## temperatt -4.544 1.774 -2.562 0.010
##
## Covariances:
## Estimate Std.Err Z-value P(>|z|)
## rigoract ~~
## depress 0.000
## ideahap ~~
## actuhap 0.037 0.013 2.757 0.006
## cultatt ~~
## temperatt 0.265 0.096 2.766 0.006
##
## Variances:
## Estimate Std.Err Z-value P(>|z|)
## ideahap 0.049 0.009 5.788 0.000
## actuhap 0.218 0.038 5.788 0.000
## rigoract 1.035 0.179 5.788 0.000
## depress 68.089 11.764 5.788 0.000
## cultatt 1.063 0.184 5.788 0.000
## temperatt 0.514 0.089 5.788 0.000
##
##
## Group 3 [3]:
##
## Regressions:
## Estimate Std.Err Z-value P(>|z|)
## ideahap ~
## cultatt 0.093 0.044 2.122 0.034
## temperatt 0.061 0.053 1.152 0.250
## actuhap ~
## cultatt 0.198 0.068 2.904 0.004
## temperatt 0.204 0.082 2.479 0.013
## rigoract ~
## ideahap 0.118 0.315 0.375 0.708
## temperatt 0.299 0.157 1.911 0.056
## depress ~
## actuhap -0.079 1.496 -0.053 0.958
## temperatt -7.142 1.221 -5.848 0.000
##
## Covariances:
## Estimate Std.Err Z-value P(>|z|)
## rigoract ~~
## depress 0.000
## ideahap ~~
## actuhap 0.044 0.022 2.023 0.043
## cultatt ~~
## temperatt 0.092 0.073 1.252 0.211
##
## Variances:
## Estimate Std.Err Z-value P(>|z|)
## ideahap 0.125 0.019 6.442 0.000
## actuhap 0.303 0.047 6.442 0.000
## rigoract 1.087 0.169 6.442 0.000
## depress 61.947 9.616 6.442 0.000
## cultatt 0.799 0.124 6.442 0.000
## temperatt 0.547 0.085 6.442 0.000
The fit to each group is acceptable when the parameters were free to vary across groups, except that the fit to Group 1 is almost significant (\(\chi^2\) = 9.8 with 5 df, p = 0.08).
What if we were to equate params across groups?
avt.group2 = sem(avt.model2, fixed.x = F, data = d, group = "group", meanstructure = F, group.equal = "regressions")
summary(avt.group2)
## lavaan (0.5-20) converged normally after 73 iterations
##
## Used Total
## Number of observations per group
## 1 65 75
## 2 67 80
## 3 83 84
##
## Estimator ML
## Minimum Function Test Statistic 38.476
## Degrees of freedom 31
## P-value (Chi-square) 0.167
##
## Chi-square for each group:
##
## 1 18.238
## 2 8.749
## 3 11.490
##
## Parameter Estimates:
##
## Information Expected
## Standard Errors Standard
##
##
## Group 1 [1]:
##
## Regressions:
## Estimate Std.Err Z-value P(>|z|)
## ideahap ~
## cultatt (.p1.) 0.037 0.020 1.853 0.064
## temprtt (.p2.) 0.115 0.029 3.991 0.000
## actuhap ~
## cultatt (.p3.) 0.106 0.035 3.015 0.003
## temprtt (.p4.) 0.402 0.051 7.829 0.000
## rigoract ~
## ideahap (.p5.) 0.300 0.253 1.185 0.236
## temprtt (.p6.) 0.205 0.114 1.795 0.073
## depress ~
## actuhap (.p7.) -1.041 1.046 -0.996 0.319
## temprtt (.p8.) -6.240 0.911 -6.850 0.000
##
## Covariances:
## Estimate Std.Err Z-value P(>|z|)
## rigoract ~~
## depress 0.000
## ideahap ~~
## actuhap 0.014 0.017 0.846 0.397
## cultatt ~~
## temperatt 0.138 0.076 1.819 0.069
##
## Variances:
## Estimate Std.Err Z-value P(>|z|)
## ideahap 0.092 0.016 5.701 0.000
## actuhap 0.203 0.036 5.701 0.000
## rigoract 2.419 0.424 5.701 0.000
## depress 55.735 9.777 5.701 0.000
## cultatt 1.047 0.184 5.701 0.000
## temperatt 0.340 0.060 5.701 0.000
##
##
## Group 2 [2]:
##
## Regressions:
## Estimate Std.Err Z-value P(>|z|)
## ideahap ~
## cultatt (.p1.) 0.037 0.020 1.853 0.064
## temprtt (.p2.) 0.115 0.029 3.991 0.000
## actuhap ~
## cultatt (.p3.) 0.106 0.035 3.015 0.003
## temprtt (.p4.) 0.402 0.051 7.829 0.000
## rigoract ~
## ideahap (.p5.) 0.300 0.253 1.185 0.236
## temprtt (.p6.) 0.205 0.114 1.795 0.073
## depress ~
## actuhap (.p7.) -1.041 1.046 -0.996 0.319
## temprtt (.p8.) -6.240 0.911 -6.850 0.000
##
## Covariances:
## Estimate Std.Err Z-value P(>|z|)
## rigoract ~~
## depress 0.000
## ideahap ~~
## actuhap 0.037 0.014 2.696 0.007
## cultatt ~~
## temperatt 0.265 0.096 2.766 0.006
##
## Variances:
## Estimate Std.Err Z-value P(>|z|)
## ideahap 0.050 0.009 5.788 0.000
## actuhap 0.220 0.038 5.788 0.000
## rigoract 1.051 0.182 5.788 0.000
## depress 70.889 12.248 5.788 0.000
## cultatt 1.063 0.184 5.788 0.000
## temperatt 0.514 0.089 5.788 0.000
##
##
## Group 3 [3]:
##
## Regressions:
## Estimate Std.Err Z-value P(>|z|)
## ideahap ~
## cultatt (.p1.) 0.037 0.020 1.853 0.064
## temprtt (.p2.) 0.115 0.029 3.991 0.000
## actuhap ~
## cultatt (.p3.) 0.106 0.035 3.015 0.003
## temprtt (.p4.) 0.402 0.051 7.829 0.000
## rigoract ~
## ideahap (.p5.) 0.300 0.253 1.185 0.236
## temprtt (.p6.) 0.205 0.114 1.795 0.073
## depress ~
## actuhap (.p7.) -1.041 1.046 -0.996 0.319
## temprtt (.p8.) -6.240 0.911 -6.850 0.000
##
## Covariances:
## Estimate Std.Err Z-value P(>|z|)
## rigoract ~~
## depress 0.000
## ideahap ~~
## actuhap 0.053 0.023 2.271 0.023
## cultatt ~~
## temperatt 0.092 0.073 1.252 0.211
##
## Variances:
## Estimate Std.Err Z-value P(>|z|)
## ideahap 0.129 0.020 6.442 0.000
## actuhap 0.328 0.051 6.442 0.000
## rigoract 1.095 0.170 6.442 0.000
## depress 62.504 9.702 6.442 0.000
## cultatt 0.799 0.124 6.442 0.000
## temperatt 0.547 0.085 6.442 0.000
anova(avt.group1,avt.group2)
## Chi Square Difference Test
##
## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
## avt.group1 15 3662.9 3824.6 15.149
## avt.group2 31 3654.2 3762.0 38.476 23.328 16 0.1053
The constrained model is not significantly worse, and it has a lower AIC. So prefer it! The sample sizes in this data set are ‘small’ for SEM, and this is one reason why different models are not significantly different - not enough power to make the distinctions.
Let us continue with model refinement purely to see how this might be done in the general case. We can start with the unconstrained model, and then impose equality constraints on params. Or we can start with the constrained model, and then relax the equality constraints. Let’s start by imposing equality constraints in the unconstrained model.
#this one constrains cultatt to be the same across both groups, but lets temperatt to be different
avt.model3 = '
# regressions
# X + Y ~ U + V + W
# Effect of cultatt on ideal hap, and on actual hap, same across cultures
# Effect of temperatt on ideal hap, and on actual hap, different across cultures
# All other params differ across cultures
ideahap ~ c(k1, k1, k1)*cultatt + c(l1, l2, l3)*temperatt
actuhap ~ c(k4, k4, k4)*cultatt + c(l4, l5, l6)*temperatt
rigoract ~ ideahap + temperatt
depress ~ actuhap + temperatt
# variances and covariances
# X ~~ Y
rigoract ~~ 0*depress
ideahap ~~ actuhap
'
avt.group3 = sem(avt.model3, fixed.x = F, data = d, group = "group", meanstructure = F)
summary(avt.group3)
## lavaan (0.5-20) converged normally after 133 iterations
##
## Used Total
## Number of observations per group
## 1 65 75
## 2 67 80
## 3 83 84
##
## Estimator ML
## Minimum Function Test Statistic 23.582
## Degrees of freedom 19
## P-value (Chi-square) 0.213
##
## Chi-square for each group:
##
## 1 13.449
## 2 4.432
## 3 5.701
##
## Parameter Estimates:
##
## Information Expected
## Standard Errors Standard
##
##
## Group 1 [1]:
##
## Regressions:
## Estimate Std.Err Z-value P(>|z|)
## ideahap ~
## cultatt (k1) 0.035 0.020 1.777 0.076
## temperatt (l1) 0.183 0.064 2.836 0.005
## actuhap ~
## cultatt (k4) 0.097 0.035 2.806 0.005
## temperatt (l4) 0.537 0.095 5.650 0.000
## rigoract ~
## ideahap 0.687 0.633 1.085 0.278
## temperatt 0.309 0.352 0.877 0.380
## depress ~
## actuhap -0.280 2.033 -0.138 0.890
## temperatt -5.327 1.965 -2.711 0.007
##
## Covariances:
## Estimate Std.Err Z-value P(>|z|)
## rigoract ~~
## depress 0.000
## ideahap ~~
## actuhap 0.011 0.017 0.693 0.488
## cultatt ~~
## temperatt 0.138 0.076 1.819 0.069
##
## Variances:
## Estimate Std.Err Z-value P(>|z|)
## ideahap 0.091 0.016 5.701 0.000
## actuhap 0.195 0.034 5.701 0.000
## rigoract 2.394 0.420 5.701 0.000
## depress 55.005 9.649 5.701 0.000
## cultatt 1.047 0.184 5.701 0.000
## temperatt 0.340 0.060 5.701 0.000
##
##
## Group 2 [2]:
##
## Regressions:
## Estimate Std.Err Z-value P(>|z|)
## ideahap ~
## cultatt (k1) 0.035 0.020 1.777 0.076
## temperatt (l2) 0.115 0.040 2.918 0.004
## actuhap ~
## cultatt (k4) 0.097 0.035 2.806 0.005
## temperatt (l5) 0.466 0.082 5.710 0.000
## rigoract ~
## ideahap 0.593 0.548 1.081 0.280
## temperatt 0.014 0.188 0.072 0.943
## depress ~
## actuhap -4.514 2.114 -2.136 0.033
## temperatt -4.544 1.780 -2.553 0.011
##
## Covariances:
## Estimate Std.Err Z-value P(>|z|)
## rigoract ~~
## depress 0.000
## ideahap ~~
## actuhap 0.037 0.014 2.695 0.007
## cultatt ~~
## temperatt 0.265 0.096 2.766 0.006
##
## Variances:
## Estimate Std.Err Z-value P(>|z|)
## ideahap 0.050 0.009 5.788 0.000
## actuhap 0.219 0.038 5.788 0.000
## rigoract 1.035 0.179 5.788 0.000
## depress 68.089 11.764 5.788 0.000
## cultatt 1.063 0.184 5.788 0.000
## temperatt 0.514 0.089 5.788 0.000
##
##
## Group 3 [3]:
##
## Regressions:
## Estimate Std.Err Z-value P(>|z|)
## ideahap ~
## cultatt (k1) 0.035 0.020 1.777 0.076
## temperatt (l3) 0.071 0.053 1.330 0.183
## actuhap ~
## cultatt (k4) 0.097 0.035 2.806 0.005
## temperatt (l6) 0.221 0.083 2.668 0.008
## rigoract ~
## ideahap 0.118 0.319 0.370 0.711
## temperatt 0.299 0.157 1.911 0.056
## depress ~
## actuhap -0.079 1.531 -0.051 0.959
## temperatt -7.142 1.224 -5.836 0.000
##
## Covariances:
## Estimate Std.Err Z-value P(>|z|)
## rigoract ~~
## depress 0.000
## ideahap ~~
## actuhap 0.049 0.023 2.171 0.030
## cultatt ~~
## temperatt 0.092 0.073 1.252 0.211
##
## Variances:
## Estimate Std.Err Z-value P(>|z|)
## ideahap 0.128 0.020 6.442 0.000
## actuhap 0.311 0.048 6.442 0.000
## rigoract 1.087 0.169 6.442 0.000
## depress 61.947 9.616 6.442 0.000
## cultatt 0.799 0.124 6.442 0.000
## temperatt 0.547 0.085 6.442 0.000
anova(avt.group1, avt.group3) #debatable whether that's actually a good thing or not
## Chi Square Difference Test
##
## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
## avt.group1 15 3662.9 3824.6 15.149
## avt.group3 19 3663.3 3811.6 23.582 8.4337 4 0.07692 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Here we’ll allow effect of temperatt on ideahap and on actuhap to vary across cultures. The group.partial command overrides the default behavior of the group.equal parameter, letting those particular regressions vary across cultures
avt.group4 = sem(avt.model2, fixed.x = F, data = d, group = "group", meanstructure = F, group.equal = "regressions", group.partial = c("ideahap ~ temperatt", "actuhap ~ temperatt"))
anova(avt.group2, avt.group4)
## Chi Square Difference Test
##
## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
## avt.group4 27 3653.9 3775.2 30.200
## avt.group2 31 3654.2 3762.0 38.476 8.2765 4 0.08196 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Again, debatable which is better.